The objective of this research is to find trends in attendance and insights into the performance of home teams by examining the FIFA World Cup dataset. The collection, which is an archive of past games, has a wealth of information about team dynamics, scores, and dates. Following the organization and summary of the data, we move on to the visualizations: a bar plot that breaks out home team goal accomplishments and a line plot that charts the growth of attendance. These succinct analyses seek to expose the underlying themes and patterns that characterize the FIFA World Cup experience, capturing the spirit of international soccer fever and the distinctive tales embedded in every game.
Owing to the size of the FIFA World Cup dataset, a first investigation is necessary to fully understand its complexity. A graphic depiction was created, emphasizing important elements of the matches, in order to obtain a preliminary comprehension. The graphic that is presented displays a bar plot that illustrates how different teams’ home team goals are distributed. A cursory look at the goal-scoring tendencies is made possible by the plot, which highlights exceptional players and possible trends. Remarkably, the scale highlights high-scoring home teams in red, while lower-scoring teams are shown in blue. This initial investigation provides a visual story of the goal dynamics inside the FIFA World Cup matches dataset, laying the groundwork for future comprehensive analysis.
# Read the FIFA World Cup matches dataset
fifa_matches <- read.csv("/Users/jeeveshrajgupta/Desktop/FP_Jeevesh/WorldCupMatches.csv")
# Display the structure of the dataset
str(fifa_matches)
## 'data.frame': 4572 obs. of 20 variables:
## $ Year : int 1930 1930 1930 1930 1930 1930 1930 1930 1930 1930 ...
## $ Datetime : chr "13 Jul 1930 - 15:00 " "13 Jul 1930 - 15:00 " "14 Jul 1930 - 12:45 " "14 Jul 1930 - 14:50 " ...
## $ Stage : chr "Group 1" "Group 4" "Group 2" "Group 3" ...
## $ Stadium : chr "Pocitos" "Parque Central" "Parque Central" "Pocitos" ...
## $ City : chr "Montevideo " "Montevideo " "Montevideo " "Montevideo " ...
## $ Home.Team.Name : chr "France" "USA" "Yugoslavia" "Romania" ...
## $ Home.Team.Goals : int 4 3 2 3 1 3 4 3 1 1 ...
## $ Away.Team.Goals : int 1 0 1 1 0 0 0 0 0 0 ...
## $ Away.Team.Name : chr "Mexico" "Belgium" "Brazil" "Peru" ...
## $ Win.conditions : chr " " " " " " " " ...
## $ Attendance : int 4444 18346 24059 2549 23409 9249 18306 18306 57735 2000 ...
## $ Half.time.Home.Goals: int 3 2 2 1 0 1 0 2 0 0 ...
## $ Half.time.Away.Goals: int 0 0 0 0 0 0 0 0 0 0 ...
## $ Referee : chr "LOMBARDI Domingo (URU)" "MACIAS Jose (ARG)" "TEJADA Anibal (URU)" "WARNKEN Alberto (CHI)" ...
## $ Assistant.1 : chr "CRISTOPHE Henry (BEL)" "MATEUCCI Francisco (URU)" "VALLARINO Ricardo (URU)" "LANGENUS Jean (BEL)" ...
## $ Assistant.2 : chr "REGO Gilberto (BRA)" "WARNKEN Alberto (CHI)" "BALWAY Thomas (FRA)" "MATEUCCI Francisco (URU)" ...
## $ RoundID : int 201 201 201 201 201 201 201 201 201 201 ...
## $ MatchID : int 1096 1090 1093 1098 1085 1095 1092 1097 1099 1094 ...
## $ Home.Team.Initials : chr "FRA" "USA" "YUG" "ROU" ...
## $ Away.Team.Initials : chr "MEX" "BEL" "BRA" "PER" ...
# Check the column names in the dataset
colnames_fifa <- colnames(fifa_matches)
# Define the columns to summarize
summary_columns <- c("Home Team Goals", "Away Team Goals", "Attendance")
# Check if the specified columns exist in the dataset
missing_columns <- setdiff(summary_columns, colnames_fifa)
# Check if there are any missing columns
if (length(missing_columns) > 0) {
warning(paste("Warning: The following columns are missing in the dataset:", paste(missing_columns, collapse = ", ")))
}
## Warning: Warning: The following columns are missing in the dataset: Home Team
## Goals, Away Team Goals
# Filter out the columns that exist in the dataset
existing_columns <- intersect(summary_columns, colnames_fifa)
# Generate summary statistics for existing numerical columns
summary(fifa_matches[, existing_columns])
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 2000 30000 41580 45165 61374 173850 3722
# Convert 'Datetime' column to a Date format
fifa_matches$Datetime <- as.Date(fifa_matches$Datetime, format = "%d %b %Y - %H:%M")
## Warning: Removed 3722 rows containing missing values (`position_stack()`).
library(plotly)
plot_ly(data = fifa_matches, x = ~Home.Team.Name, y = ~Home.Team.Goals, type = 'bar', color = ~Home.Team.Name) %>%
layout(title = "Home Team Performances", xaxis = list(tickangle = 45, tickmode = "array", tickvals = seq(1, nrow(fifa_matches), 5), ticktext = fifa_matches$Home.Team.Name[seq(1, nrow(fifa_matches), 5)]))
## Warning: Ignoring 3720 observations
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
library(gt)
# Calculate total attendance for each year
total_attendance <- fifa_matches %>%
group_by(Year) %>%
summarise(Total_Attendance = sum(Attendance, na.rm = TRUE))
# Print the table
total_attendance %>%
gt() %>%
tab_header(
title = "Total Attendance Summary",
subtitle = NULL
) %>%
fmt_number(
columns = vars(Total_Attendance),
decimals = 0
)
## Warning: Since gt v0.3.0, `columns = vars(...)` has been deprecated.
## • Please use `columns = c(...)` instead.
## Warning: Since gt v0.3.0, `columns = vars(...)` has been deprecated.
## • Please use `columns = c(...)` instead.
| Total Attendance Summary | |
| Year | Total_Attendance |
|---|---|
| 1930 | 590,549 |
| 1934 | 363,000 |
| 1938 | 375,700 |
| 1950 | 1,045,246 |
| 1954 | 768,607 |
| 1958 | 819,810 |
| 1962 | 893,172 |
| 1966 | 1,563,135 |
| 1970 | 1,603,975 |
| 1974 | 1,865,753 |
| 1978 | 1,545,791 |
| 1982 | 2,109,723 |
| 1986 | 2,394,031 |
| 1990 | 2,516,215 |
| 1994 | 3,587,538 |
| 1998 | 2,785,100 |
| 2002 | 2,705,197 |
| 2006 | 3,359,439 |
| 2010 | 3,178,856 |
| 2014 | 4,319,243 |
| NA | 0 |
The FIFA World Cup attendance bar plot shows interesting patterns throughout time, with peaks in the early 1990s and consistent increases since the 1950s. The tournament’s tenacity and capacity to connect with a worldwide audience are suggested by this graphic tale, whose high points correspond with important turning points in soccer history. However, the interactive bar plot that highlights the home team’s performances provides information about the mechanics of goal scoring. Well-established soccer superpowers such as Brazil, Germany, and Italy are habitually dominant and suggest a strong home advantage. It’s interesting to see that home team goals rise in hosting countries, highlighting the effect of home-field advantage on performance. When combined, these graphics offer an engrossing glimpse into the changing dynamics and popularity of the FIFA World Cup throughout the world.